Overview

Dataset statistics

Number of variables40
Number of observations244
Missing cells678
Missing cells (%)6.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory357.8 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2015" Constant
ID_REGIONA has constant value "" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
ID_OCUPA_N has a high cardinality: 98 distinct values High cardinality
DEXAME has a high cardinality: 128 distinct values High cardinality
DTRATA has a high cardinality: 51 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
COPAISINF is highly correlated with PMMHigh correlation
PMM is highly correlated with COPAISINFHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
COUFINF is highly correlated with RESULT and 8 other fieldsHigh correlation
PMM is highly correlated with CS_RACA and 4 other fieldsHigh correlation
CS_RACA is highly correlated with PMM and 4 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 12 other fieldsHigh correlation
AT_SINTOMA is highly correlated with RESULT and 3 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with DTRATA and 1 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 15 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 5 other fieldsHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
ID_OCUPA_N is highly correlated with COUFINF and 7 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 9 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 10 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 11 other fieldsHigh correlation
COPAISINF is highly correlated with PMM and 8 other fieldsHigh correlation
DSTRAESQUE is highly correlated with PMM and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 7 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with COUFINF and 9 other fieldsHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 8 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with COUFINF and 11 other fieldsHigh correlation
ID_MN_RESI is highly correlated with CS_RACA and 3 other fieldsHigh correlation
COUFINF is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
ID_REGIONA is highly correlated with COUFINF and 24 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 14 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with COUFINF and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 11 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
RESULT is highly correlated with ID_REGIONA and 13 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
TPAUTOCTO is highly correlated with ID_REGIONA and 12 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_REGIONA and 9 other fieldsHigh correlation
CLASSI_FIN is highly correlated with ID_REGIONA and 13 other fieldsHigh correlation
PCRUZ is highly correlated with ID_REGIONA and 12 other fieldsHigh correlation
DT_INVEST has 244 (100.0%) missing values Missing
PMM has 190 (77.9%) missing values Missing
DT_ENCERRA has 244 (100.0%) missing values Missing
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 175 (71.7%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:52:03.424048
Analysis finished2021-07-06 18:52:24.155716
Duration20.73 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
2
244 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters244
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2244
100.0%

Most occurring characters

ValueCountFrequency (%)
2244
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number244
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2244
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common244
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2244
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII244
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2244
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size18.2 KiB
B54
244 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters732
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54244
100.0%

Most occurring characters

ValueCountFrequency (%)
B244
33.3%
5244
33.3%
4244
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number488
66.7%
Uppercase Letter244
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5244
50.0%
4244
50.0%
Uppercase Letter
ValueCountFrequency (%)
B244
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common488
66.7%
Latin244
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5244
50.0%
4244
50.0%
Latin
ValueCountFrequency (%)
B244
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII732
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B244
33.3%
5244
33.3%
4244
33.3%
Distinct134
Distinct (%)54.9%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
Minimum2015-01-02 00:00:00
Maximum2015-12-26 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct48
Distinct (%)19.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201517.9016
Minimum201453
Maximum201551
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum201453
5-th percentile201503
Q1201509
median201512
Q3201525.25
95-th percentile201548
Maximum201551
Range98
Interquartile range (IQR)16.25

Descriptive statistics

Standard deviation14.50665228
Coefficient of variation (CV)7.198691609 × 10-5
Kurtosis1.163243467
Mean201517.9016
Median Absolute Deviation (MAD)5
Skewness0.6077509865
Sum49170368
Variance210.4429603
MonotonicityIncreasing
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
20150935
 
14.3%
20151018
 
7.4%
20150816
 
6.6%
20151413
 
5.3%
20151110
 
4.1%
20150710
 
4.1%
2015067
 
2.9%
2015456
 
2.5%
2015156
 
2.5%
2015136
 
2.5%
Other values (38)117
48.0%
ValueCountFrequency (%)
2014531
 
0.4%
2015014
 
1.6%
2015024
 
1.6%
2015035
 
2.0%
2015044
 
1.6%
2015056
 
2.5%
2015067
 
2.9%
20150710
 
4.1%
20150816
6.6%
20150935
14.3%
ValueCountFrequency (%)
2015512
 
0.8%
2015506
2.5%
2015494
1.6%
2015482
 
0.8%
2015472
 
0.8%
2015463
1.2%
2015456
2.5%
2015441
 
0.4%
2015402
 
0.8%
2015394
1.6%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.7 KiB
2015
244 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters976
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
2015244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2015244
100.0%

Most occurring characters

ValueCountFrequency (%)
2244
25.0%
0244
25.0%
1244
25.0%
5244
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number976
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2244
25.0%
0244
25.0%
1244
25.0%
5244
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common976
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2244
25.0%
0244
25.0%
1244
25.0%
5244
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII976
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2244
25.0%
0244
25.0%
1244
25.0%
5244
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
33
243 
53
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters488
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33243
99.6%
531
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33243
99.6%
531
 
0.4%

Most occurring characters

ValueCountFrequency (%)
3487
99.8%
51
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number488
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3487
99.8%
51
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common488
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3487
99.8%
51
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII488
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3487
99.8%
51
 
0.2%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean331223.2049
Minimum330010
Maximum530010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum330010
5-th percentile330240
Q1330340
median330455
Q3330455
95-th percentile330455
Maximum530010
Range200000
Interquartile range (IQR)115

Descriptive statistics

Standard deviation12778.74515
Coefficient of variation (CV)0.03858046466
Kurtosis243.9726141
Mean331223.2049
Median Absolute Deviation (MAD)0
Skewness15.61918918
Sum80818462
Variance163296327.6
MonotonicityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
330455166
68.0%
33024025
 
10.2%
33034024
 
9.8%
3303307
 
2.9%
3302503
 
1.2%
3304903
 
1.2%
3301703
 
1.2%
3301852
 
0.8%
3306102
 
0.8%
3301501
 
0.4%
Other values (8)8
 
3.3%
ValueCountFrequency (%)
3300101
 
0.4%
3300401
 
0.4%
3300701
 
0.4%
3301501
 
0.4%
3301703
 
1.2%
3301852
 
0.8%
33024025
10.2%
3302503
 
1.2%
3303307
 
2.9%
33034024
9.8%
ValueCountFrequency (%)
5300101
 
0.4%
3306301
 
0.4%
3306102
 
0.8%
3304903
 
1.2%
330455166
68.0%
3304521
 
0.4%
3304301
 
0.4%
3303501
 
0.4%
33034024
 
9.8%
3303307
 
2.9%

ID_REGIONA
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.7 KiB
244 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_UNIDADE
Real number (ℝ≥0)

Distinct77
Distinct (%)31.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2840699.361
Minimum69
Maximum7642415
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum69
5-th percentile2269783
Q12276534
median2288338
Q33005992
95-th percentile5465428.35
Maximum7642415
Range7642346
Interquartile range (IQR)729458

Descriptive statistics

Standard deviation1345263.264
Coefficient of variation (CV)0.47356763
Kurtosis2.584384319
Mean2840699.361
Median Absolute Deviation (MAD)11804
Skewness1.554737087
Sum693130644
Variance1.809733249 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
228833887
35.7%
227653423
 
9.4%
546288618
 
7.4%
300599213
 
5.3%
22718859
 
3.7%
22727848
 
3.3%
22697834
 
1.6%
22733652
 
0.8%
36077202
 
0.8%
32116492
 
0.8%
Other values (67)76
31.1%
ValueCountFrequency (%)
691
0.4%
761
0.4%
125131
0.4%
125481
0.4%
125991
0.4%
128311
0.4%
260501
0.4%
22695461
0.4%
22695541
0.4%
22696512
0.8%
ValueCountFrequency (%)
76424151
0.4%
74589401
0.4%
69954621
0.4%
69381241
0.4%
67932311
0.4%
67534692
0.8%
67340141
0.4%
64271381
0.4%
61463761
0.4%
54763212
0.8%
Distinct153
Distinct (%)62.7%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
Minimum2014-09-30 00:00:00
Maximum2015-12-25 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201515.3607
Minimum201440
Maximum201551
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum201440
5-th percentile201502
Q1201507
median201510
Q3201524
95-th percentile201545
Maximum201551
Range111
Interquartile range (IQR)17

Descriptive statistics

Standard deviation16.53158373
Coefficient of variation (CV)8.203634538 × 10-5
Kurtosis3.969377108
Mean201515.3607
Median Absolute Deviation (MAD)5
Skewness-0.5789124383
Sum49169748
Variance273.2932605
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20150823
 
9.4%
20150921
 
8.6%
20150517
 
7.0%
20150717
 
7.0%
20150611
 
4.5%
20151310
 
4.1%
2015038
 
3.3%
2015148
 
3.3%
2015107
 
2.9%
2015377
 
2.9%
Other values (42)115
47.1%
ValueCountFrequency (%)
2014401
 
0.4%
2014511
 
0.4%
2014522
 
0.8%
2014531
 
0.4%
2015015
 
2.0%
2015026
 
2.5%
2015038
3.3%
2015043
 
1.2%
20150517
7.0%
20150611
4.5%
ValueCountFrequency (%)
2015511
 
0.4%
2015502
0.8%
2015494
1.6%
2015481
 
0.4%
2015471
 
0.4%
2015463
1.2%
2015454
1.6%
2015444
1.6%
2015431
 
0.4%
2015411
 
0.4%
Distinct233
Distinct (%)95.5%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
Minimum1940-04-17 00:00:00
Maximum2014-05-05 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

Distinct64
Distinct (%)26.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4035.172131
Minimum3010
Maximum4074
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum3010
5-th percentile4010
Q14028
median4039
Q34052
95-th percentile4066.85
Maximum4074
Range1064
Interquartile range (IQR)24

Descriptive statistics

Standard deviation67.88009189
Coefficient of variation (CV)0.01682210565
Kurtosis216.4219365
Mean4035.172131
Median Absolute Deviation (MAD)12
Skewness-14.28346634
Sum984582
Variance4607.706874
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
402812
 
4.9%
403311
 
4.5%
40409
 
3.7%
40349
 
3.7%
40578
 
3.3%
40427
 
2.9%
40317
 
2.9%
40517
 
2.9%
40527
 
2.9%
40026
 
2.5%
Other values (54)161
66.0%
ValueCountFrequency (%)
30101
 
0.4%
40026
2.5%
40041
 
0.4%
40072
 
0.8%
40082
 
0.8%
40102
 
0.8%
40132
 
0.8%
40141
 
0.4%
40151
 
0.4%
40163
1.2%
ValueCountFrequency (%)
40742
0.8%
40723
1.2%
40712
0.8%
40691
 
0.4%
40683
1.2%
40672
0.8%
40661
 
0.4%
40654
1.6%
40641
 
0.4%
40631
 
0.4%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size15.9 KiB
M
173 
F
71 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters244
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M173
70.9%
F71
29.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m173
70.9%
f71
29.1%

Most occurring characters

ValueCountFrequency (%)
M173
70.9%
F71
29.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter244
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M173
70.9%
F71
29.1%

Most occurring scripts

ValueCountFrequency (%)
Latin244
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M173
70.9%
F71
29.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII244
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M173
70.9%
F71
29.1%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
6
189 
5
48 
9
 
6
1
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters244
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row6
2nd row5
3rd row6
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

Most occurring characters

ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number244
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common244
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII244
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6189
77.5%
548
 
19.7%
96
 
2.5%
11
 
0.4%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1
120 
9
71 
4
23 
2
21 
 
7
Other values (2)
 
2

Length

Max length1
Median length1
Mean length0.9713114754
Min length0

Characters and Unicode

Total characters237
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row4
2nd row1
3rd row1
4th row2
5th row4

Common Values

ValueCountFrequency (%)
1120
49.2%
971
29.1%
423
 
9.4%
221
 
8.6%
7
 
2.9%
51
 
0.4%
31
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1120
50.6%
971
30.0%
423
 
9.7%
221
 
8.9%
51
 
0.4%
31
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1120
50.6%
971
30.0%
423
 
9.7%
221
 
8.9%
31
 
0.4%
51
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1120
50.6%
971
30.0%
423
 
9.7%
221
 
8.9%
31
 
0.4%
51
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1120
50.6%
971
30.0%
423
 
9.7%
221
 
8.9%
31
 
0.4%
51
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1120
50.6%
971
30.0%
423
 
9.7%
221
 
8.9%
31
 
0.4%
51
 
0.4%

CS_ESCOL_N
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
08
97 
09
58 
06
19 
13 
07
11 
Other values (7)
46 

Length

Max length2
Median length2
Mean length1.893442623
Min length0

Characters and Unicode

Total characters462
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10
2nd row08
3rd row08
4th row07
5th row08

Common Values

ValueCountFrequency (%)
0897
39.8%
0958
23.8%
0619
 
7.8%
13
 
5.3%
0711
 
4.5%
0511
 
4.5%
108
 
3.3%
038
 
3.3%
017
 
2.9%
025
 
2.0%
Other values (2)7
 
2.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
0897
42.0%
0958
25.1%
0619
 
8.2%
0711
 
4.8%
0511
 
4.8%
108
 
3.5%
038
 
3.5%
017
 
3.0%
025
 
2.2%
044
 
1.7%

Most occurring characters

ValueCountFrequency (%)
0234
50.6%
897
21.0%
958
 
12.6%
619
 
4.1%
115
 
3.2%
711
 
2.4%
511
 
2.4%
38
 
1.7%
25
 
1.1%
44
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number462
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0234
50.6%
897
21.0%
958
 
12.6%
619
 
4.1%
115
 
3.2%
711
 
2.4%
511
 
2.4%
38
 
1.7%
25
 
1.1%
44
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common462
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0234
50.6%
897
21.0%
958
 
12.6%
619
 
4.1%
115
 
3.2%
711
 
2.4%
511
 
2.4%
38
 
1.7%
25
 
1.1%
44
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII462
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0234
50.6%
897
21.0%
958
 
12.6%
619
 
4.1%
115
 
3.2%
711
 
2.4%
511
 
2.4%
38
 
1.7%
25
 
1.1%
44
 
0.9%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.2 KiB
33
244 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters488
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33244
100.0%

Most occurring characters

ValueCountFrequency (%)
3488
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number488
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common488
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3488
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII488
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3488
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct26
Distinct (%)10.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330376.1762
Minimum330010
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum330010
5-th percentile330170
Q1330330
median330455
Q3330455
95-th percentile330455
Maximum330630
Range620
Interquartile range (IQR)125

Descriptive statistics

Standard deviation116.4247832
Coefficient of variation (CV)0.0003524006618
Kurtosis0.4681075846
Mean330376.1762
Median Absolute Deviation (MAD)0
Skewness-0.9495002607
Sum80611787
Variance13554.73013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
330455126
51.6%
33034027
 
11.1%
33024024
 
9.8%
33033020
 
8.2%
3301706
 
2.5%
3301905
 
2.0%
3304904
 
1.6%
3303503
 
1.2%
3301853
 
1.2%
3302503
 
1.2%
Other values (16)23
 
9.4%
ValueCountFrequency (%)
3300101
 
0.4%
3300231
 
0.4%
3300403
1.2%
3300702
 
0.8%
3301301
 
0.4%
3301503
1.2%
3301706
2.5%
3301853
1.2%
3301905
2.0%
3302001
 
0.4%
ValueCountFrequency (%)
3306301
 
0.4%
3306102
 
0.8%
3305901
 
0.4%
3305801
 
0.4%
3305601
 
0.4%
3304904
 
1.6%
330455126
51.6%
3304522
 
0.8%
3304301
 
0.4%
3303601
 
0.4%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.7 KiB
244 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
1
244 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters244
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1244
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1244
100.0%

Most occurring characters

ValueCountFrequency (%)
1244
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number244
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1244
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common244
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1244
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII244
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1244
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing244
Missing (%)100.0%
Memory size2.0 KiB

ID_OCUPA_N
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct98
Distinct (%)40.2%
Missing0
Missing (%)0.0%
Memory size15.0 KiB
65 
999991
26 
214205
 
12
999993
 
8
262105
 
4
Other values (93)
129 

Length

Max length6
Median length6
Mean length4.401639344
Min length0

Characters and Unicode

Total characters1074
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique69 ?
Unique (%)28.3%

Sample

1st row
2nd row221105
3rd row
4th row782305
5th row262105

Common Values

ValueCountFrequency (%)
65
26.6%
99999126
 
10.7%
21420512
 
4.9%
9999938
 
3.3%
2621054
 
1.6%
2410054
 
1.6%
2521054
 
1.6%
2211054
 
1.6%
9999923
 
1.2%
7152103
 
1.2%
Other values (88)111
45.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
99999126
 
14.5%
21420512
 
6.7%
9999938
 
4.5%
2410054
 
2.2%
2521054
 
2.2%
2211054
 
2.2%
2621054
 
2.2%
2231153
 
1.7%
7823053
 
1.7%
7152103
 
1.7%
Other values (87)108
60.3%

Most occurring characters

ValueCountFrequency (%)
1200
18.6%
9199
18.5%
2170
15.8%
5152
14.2%
0145
13.5%
377
 
7.2%
466
 
6.1%
729
 
2.7%
626
 
2.4%
810
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1074
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1200
18.6%
9199
18.5%
2170
15.8%
5152
14.2%
0145
13.5%
377
 
7.2%
466
 
6.1%
729
 
2.7%
626
 
2.4%
810
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common1074
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1200
18.6%
9199
18.5%
2170
15.8%
5152
14.2%
0145
13.5%
377
 
7.2%
466
 
6.1%
729
 
2.7%
626
 
2.4%
810
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1074
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1200
18.6%
9199
18.5%
2170
15.8%
5152
14.2%
0145
13.5%
377
 
7.2%
466
 
6.1%
729
 
2.7%
626
 
2.4%
810
 
0.9%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
2
171 
1
68 
8
 
4
 
1

Length

Max length1
Median length1
Mean length0.9959016393
Min length0

Characters and Unicode

Total characters243
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row2
2nd row1
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2171
70.1%
168
 
27.9%
84
 
1.6%
1
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2171
70.4%
168
 
28.0%
84
 
1.6%

Most occurring characters

ValueCountFrequency (%)
2171
70.4%
168
 
28.0%
84
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number243
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2171
70.4%
168
 
28.0%
84
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Common243
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2171
70.4%
168
 
28.0%
84
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII243
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2171
70.4%
168
 
28.0%
84
 
1.6%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size16.6 KiB
11
75 
10
72 
4
50 
99
17 
3
Other values (5)
21 

Length

Max length2
Median length2
Mean length1.659836066
Min length0

Characters and Unicode

Total characters405
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11
2nd row10
3rd row4
4th row11
5th row10

Common Values

ValueCountFrequency (%)
1175
30.7%
1072
29.5%
450
20.5%
9917
 
7.0%
39
 
3.7%
96
 
2.5%
6
 
2.5%
14
 
1.6%
123
 
1.2%
52
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1175
31.5%
1072
30.3%
450
21.0%
9917
 
7.1%
39
 
3.8%
96
 
2.5%
14
 
1.7%
123
 
1.3%
52
 
0.8%

Most occurring characters

ValueCountFrequency (%)
1229
56.5%
072
 
17.8%
450
 
12.3%
940
 
9.9%
39
 
2.2%
23
 
0.7%
52
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number405
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1229
56.5%
072
 
17.8%
450
 
12.3%
940
 
9.9%
39
 
2.2%
23
 
0.7%
52
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common405
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1229
56.5%
072
 
17.8%
450
 
12.3%
940
 
9.9%
39
 
2.2%
23
 
0.7%
52
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1229
56.5%
072
 
17.8%
450
 
12.3%
940
 
9.9%
39
 
2.2%
23
 
0.7%
52
 
0.5%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1
166 
2
69 
 
6
3
 
3

Length

Max length1
Median length1
Mean length0.9754098361
Min length0

Characters and Unicode

Total characters238
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1166
68.0%
269
28.3%
6
 
2.5%
33
 
1.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1166
69.7%
269
29.0%
33
 
1.3%

Most occurring characters

ValueCountFrequency (%)
1166
69.7%
269
29.0%
33
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1166
69.7%
269
29.0%
33
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1166
69.7%
269
29.0%
33
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1166
69.7%
269
29.0%
33
 
1.3%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1
223 
2
 
15
 
6

Length

Max length1
Median length1
Mean length0.9754098361
Min length0

Characters and Unicode

Total characters238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1223
91.4%
215
 
6.1%
6
 
2.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1223
93.7%
215
 
6.3%

Most occurring characters

ValueCountFrequency (%)
1223
93.7%
215
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1223
93.7%
215
 
6.3%

Most occurring scripts

ValueCountFrequency (%)
Common238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1223
93.7%
215
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1223
93.7%
215
 
6.3%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size15.0 KiB
174 
2
58 
1
 
11
3
 
1

Length

Max length1
Median length0
Mean length0.2868852459
Min length0

Characters and Unicode

Total characters70
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row
2nd row2
3rd row
4th row2
5th row

Common Values

ValueCountFrequency (%)
174
71.3%
258
 
23.8%
111
 
4.5%
31
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
258
82.9%
111
 
15.7%
31
 
1.4%

Most occurring characters

ValueCountFrequency (%)
258
82.9%
111
 
15.7%
31
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number70
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
258
82.9%
111
 
15.7%
31
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common70
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
258
82.9%
111
 
15.7%
31
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII70
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
258
82.9%
111
 
15.7%
31
 
1.4%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size15.1 KiB
201 
RJ
35 
AM
 
4
AP
 
2
SP
 
1

Length

Max length2
Median length0
Mean length0.3524590164
Min length0

Characters and Unicode

Total characters86
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row
2nd rowAM
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
201
82.4%
RJ35
 
14.3%
AM4
 
1.6%
AP2
 
0.8%
SP1
 
0.4%
MA1
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
rj35
81.4%
am4
 
9.3%
ap2
 
4.7%
ma1
 
2.3%
sp1
 
2.3%

Most occurring characters

ValueCountFrequency (%)
R35
40.7%
J35
40.7%
A7
 
8.1%
M5
 
5.8%
P3
 
3.5%
S1
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter86
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R35
40.7%
J35
40.7%
A7
 
8.1%
M5
 
5.8%
P3
 
3.5%
S1
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin86
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R35
40.7%
J35
40.7%
A7
 
8.1%
M5
 
5.8%
P3
 
3.5%
S1
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII86
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R35
40.7%
J35
40.7%
A7
 
8.1%
M5
 
5.8%
P3
 
3.5%
S1
 
1.2%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.422131148
Minimum0
Maximum176
Zeros175
Zeros (%)71.7%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile31
Maximum176
Range176
Interquartile range (IQR)1

Descriptive statistics

Standard deviation24.50980492
Coefficient of variation (CV)3.816459732
Kurtosis26.45366574
Mean6.422131148
Median Absolute Deviation (MAD)0
Skewness5.050835183
Sum1567
Variance600.730537
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0175
71.7%
143
 
17.6%
3115
 
6.1%
1383
 
1.2%
222
 
0.8%
1761
 
0.4%
1531
 
0.4%
1281
 
0.4%
1091
 
0.4%
281
 
0.4%
ValueCountFrequency (%)
0175
71.7%
143
 
17.6%
71
 
0.4%
222
 
0.8%
281
 
0.4%
3115
 
6.1%
1091
 
0.4%
1281
 
0.4%
1383
 
1.2%
1531
 
0.4%
ValueCountFrequency (%)
1761
 
0.4%
1531
 
0.4%
1383
 
1.2%
1281
 
0.4%
1091
 
0.4%
3115
 
6.1%
281
 
0.4%
222
 
0.8%
71
 
0.4%
143
17.6%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct16
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Memory size14.7 KiB
202 
330340
 
14
330290
 
5
330240
 
4
330185
 
3
Other values (11)
 
16

Length

Max length6
Median length0
Mean length1.032786885
Min length0

Characters and Unicode

Total characters252
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)2.9%

Sample

1st row
2nd row130260
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
202
82.8%
33034014
 
5.7%
3302905
 
2.0%
3302404
 
1.6%
3301853
 
1.2%
3303903
 
1.2%
3302502
 
0.8%
1600302
 
0.8%
1301202
 
0.8%
1302601
 
0.4%
Other values (6)6
 
2.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
33034014
33.3%
3302905
 
11.9%
3302404
 
9.5%
3301853
 
7.1%
3303903
 
7.1%
3302502
 
4.8%
1600302
 
4.8%
1301202
 
4.8%
1302601
 
2.4%
3305801
 
2.4%
Other values (5)5
 
11.9%

Most occurring characters

ValueCountFrequency (%)
393
36.9%
084
33.3%
419
 
7.5%
216
 
6.3%
113
 
5.2%
510
 
4.0%
98
 
3.2%
85
 
2.0%
63
 
1.2%
71
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number252
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
393
36.9%
084
33.3%
419
 
7.5%
216
 
6.3%
113
 
5.2%
510
 
4.0%
98
 
3.2%
85
 
2.0%
63
 
1.2%
71
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common252
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
393
36.9%
084
33.3%
419
 
7.5%
216
 
6.3%
113
 
5.2%
510
 
4.0%
98
 
3.2%
85
 
2.0%
63
 
1.2%
71
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
393
36.9%
084
33.3%
419
 
7.5%
216
 
6.3%
113
 
5.2%
510
 
4.0%
98
 
3.2%
85
 
2.0%
63
 
1.2%
71
 
0.4%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size14.7 KiB
203 
LUAN
 
9
VALE
 
7
LUMI
 
5
SANA
 
3
Other values (12)
 
17

Length

Max length4
Median length0
Mean length0.6680327869
Min length0

Characters and Unicode

Total characters163
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)3.3%

Sample

1st row
2nd row
3rd row
4th rowLUAN
5th row

Common Values

ValueCountFrequency (%)
203
83.2%
LUAN9
 
3.7%
VALE7
 
2.9%
LUMI5
 
2.0%
SANA3
 
1.2%
ANGO3
 
1.2%
MACA2
 
0.8%
MONT2
 
0.8%
BENF2
 
0.8%
SAO1
 
0.4%
Other values (7)7
 
2.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
luan9
22.0%
vale7
17.1%
lumi5
12.2%
ango3
 
7.3%
sana3
 
7.3%
mont2
 
4.9%
maca2
 
4.9%
benf2
 
4.9%
serr1
 
2.4%
pedr1
 
2.4%
Other values (6)6
14.6%

Most occurring characters

ValueCountFrequency (%)
A34
20.9%
L21
12.9%
N20
12.3%
E15
9.2%
U14
8.6%
M12
 
7.4%
V8
 
4.9%
O7
 
4.3%
I5
 
3.1%
S5
 
3.1%
Other values (10)22
13.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter163
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A34
20.9%
L21
12.9%
N20
12.3%
E15
9.2%
U14
8.6%
M12
 
7.4%
V8
 
4.9%
O7
 
4.3%
I5
 
3.1%
S5
 
3.1%
Other values (10)22
13.5%

Most occurring scripts

ValueCountFrequency (%)
Latin163
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A34
20.9%
L21
12.9%
N20
12.3%
E15
9.2%
U14
8.6%
M12
 
7.4%
V8
 
4.9%
O7
 
4.3%
I5
 
3.1%
S5
 
3.1%
Other values (10)22
13.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII163
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A34
20.9%
L21
12.9%
N20
12.3%
E15
9.2%
U14
8.6%
M12
 
7.4%
V8
 
4.9%
O7
 
4.3%
I5
 
3.1%
S5
 
3.1%
Other values (10)22
13.5%

DEXAME
Categorical

HIGH CARDINALITY

Distinct128
Distinct (%)52.5%
Missing0
Missing (%)0.0%
Memory size16.1 KiB
2015-03-02
 
12
2015-03-06
 
8
2015-02-27
 
8
2015-03-05
 
7
None
 
6
Other values (123)
203 

Length

Max length10
Median length10
Mean length9.852459016
Min length4

Characters and Unicode

Total characters2404
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)29.5%

Sample

1st row2015-01-02
2nd row2015-01-06
3rd row2015-01-08
4th row2015-01-09
5th row2015-01-09

Common Values

ValueCountFrequency (%)
2015-03-0212
 
4.9%
2015-03-068
 
3.3%
2015-02-278
 
3.3%
2015-03-057
 
2.9%
None6
 
2.5%
2015-02-196
 
2.5%
2015-03-305
 
2.0%
2015-03-095
 
2.0%
2015-02-204
 
1.6%
2015-02-124
 
1.6%
Other values (118)179
73.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2015-03-0212
 
4.9%
2015-03-068
 
3.3%
2015-02-278
 
3.3%
2015-03-057
 
2.9%
none6
 
2.5%
2015-02-196
 
2.5%
2015-03-305
 
2.0%
2015-03-095
 
2.0%
2015-02-204
 
1.6%
2015-02-124
 
1.6%
Other values (118)179
73.4%

Most occurring characters

ValueCountFrequency (%)
0560
23.3%
-476
19.8%
1398
16.6%
2386
16.1%
5275
11.4%
3102
 
4.2%
951
 
2.1%
444
 
1.8%
639
 
1.6%
726
 
1.1%
Other values (5)47
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1904
79.2%
Dash Punctuation476
 
19.8%
Lowercase Letter18
 
0.7%
Uppercase Letter6
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0560
29.4%
1398
20.9%
2386
20.3%
5275
14.4%
3102
 
5.4%
951
 
2.7%
444
 
2.3%
639
 
2.0%
726
 
1.4%
823
 
1.2%
Lowercase Letter
ValueCountFrequency (%)
o6
33.3%
n6
33.3%
e6
33.3%
Dash Punctuation
ValueCountFrequency (%)
-476
100.0%
Uppercase Letter
ValueCountFrequency (%)
N6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2380
99.0%
Latin24
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0560
23.5%
-476
20.0%
1398
16.7%
2386
16.2%
5275
11.6%
3102
 
4.3%
951
 
2.1%
444
 
1.8%
639
 
1.6%
726
 
1.1%
Latin
ValueCountFrequency (%)
N6
25.0%
o6
25.0%
n6
25.0%
e6
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2404
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0560
23.3%
-476
19.8%
1398
16.6%
2386
16.1%
5275
11.4%
3102
 
4.2%
951
 
2.1%
444
 
1.8%
639
 
1.6%
726
 
1.1%
Other values (5)47
 
2.0%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1
170 
4
44 
2
22 
 
6
8
 
1

Length

Max length1
Median length1
Mean length0.9754098361
Min length0

Characters and Unicode

Total characters238
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row1
2nd row4
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1170
69.7%
444
 
18.0%
222
 
9.0%
6
 
2.5%
81
 
0.4%
71
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1170
71.4%
444
 
18.5%
222
 
9.2%
81
 
0.4%
71
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1170
71.4%
444
 
18.5%
222
 
9.2%
81
 
0.4%
71
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1170
71.4%
444
 
18.5%
222
 
9.2%
81
 
0.4%
71
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1170
71.4%
444
 
18.5%
222
 
9.2%
81
 
0.4%
71
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1170
71.4%
444
 
18.5%
222
 
9.2%
81
 
0.4%
71
 
0.4%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct47
Distinct (%)87.0%
Missing190
Missing (%)77.9%
Infinite0
Infinite (%)0.0%
Mean10146.27778
Minimum1
Maximum100001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB

Quantile statistics

Minimum1
5-th percentile58.4
Q1231.25
median548
Q35520
95-th percentile82772.05
Maximum100001
Range100000
Interquartile range (IQR)5288.75

Descriptive statistics

Standard deviation24717.1395
Coefficient of variation (CV)2.43607952
Kurtosis8.181322948
Mean10146.27778
Median Absolute Deviation (MAD)460
Skewness3.026917605
Sum547899
Variance610936985.1
MonotonicityNot monotonic
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
3013
 
1.2%
643
 
1.2%
2002
 
0.8%
1122
 
0.8%
4802
 
0.8%
8001
 
0.4%
3101
 
0.4%
3201
 
0.4%
8101
 
0.4%
5761
 
0.4%
Other values (37)37
 
15.2%
(Missing)190
77.9%
ValueCountFrequency (%)
11
 
0.4%
161
 
0.4%
481
 
0.4%
643
1.2%
801
 
0.4%
961
 
0.4%
1122
0.8%
1281
 
0.4%
2002
0.8%
2081
 
0.4%
ValueCountFrequency (%)
1000011
0.4%
1000001
0.4%
938631
0.4%
768001
0.4%
423201
0.4%
178001
0.4%
157201
0.4%
131601
0.4%
124001
0.4%
120001
0.4%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size15.0 KiB
176 
4
19 
3
 
14
1
 
13
5
 
13
Other values (2)
 
9

Length

Max length1
Median length0
Mean length0.2786885246
Min length0

Characters and Unicode

Total characters68
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row4
3rd row
4th row5
5th row

Common Values

ValueCountFrequency (%)
176
72.1%
419
 
7.8%
314
 
5.7%
113
 
5.3%
513
 
5.3%
25
 
2.0%
64
 
1.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
419
27.9%
314
20.6%
113
19.1%
513
19.1%
25
 
7.4%
64
 
5.9%

Most occurring characters

ValueCountFrequency (%)
419
27.9%
314
20.6%
513
19.1%
113
19.1%
25
 
7.4%
64
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number68
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
419
27.9%
314
20.6%
513
19.1%
113
19.1%
25
 
7.4%
64
 
5.9%

Most occurring scripts

ValueCountFrequency (%)
Common68
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
419
27.9%
314
20.6%
513
19.1%
113
19.1%
25
 
7.4%
64
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII68
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
419
27.9%
314
20.6%
513
19.1%
113
19.1%
25
 
7.4%
64
 
5.9%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size15.1 KiB
178 
1
37 
99
19 
11
 
7
4
 
1
Other values (2)
 
2

Length

Max length2
Median length0
Mean length0.3770491803
Min length0

Characters and Unicode

Total characters92
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.2%

Sample

1st row
2nd row1
3rd row
4th row11
5th row

Common Values

ValueCountFrequency (%)
178
73.0%
137
 
15.2%
9919
 
7.8%
117
 
2.9%
41
 
0.4%
21
 
0.4%
31
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
137
56.1%
9919
28.8%
117
 
10.6%
41
 
1.5%
21
 
1.5%
31
 
1.5%

Most occurring characters

ValueCountFrequency (%)
151
55.4%
938
41.3%
31
 
1.1%
41
 
1.1%
21
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number92
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
151
55.4%
938
41.3%
31
 
1.1%
41
 
1.1%
21
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common92
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
151
55.4%
938
41.3%
31
 
1.1%
41
 
1.1%
21
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII92
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
151
55.4%
938
41.3%
31
 
1.1%
41
 
1.1%
21
 
1.1%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct16
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Memory size15.0 KiB
225 
ARTESUNATO + MEFLOQUINA
 
3
ARTESUNATO+MEFLOQUINA
 
2
ARTESUNATO E MEFLOQUINA
 
2
ARTESU+MEFL+CLORIDRATO
 
1
Other values (11)
 
11

Length

Max length30
Median length0
Mean length1.823770492
Min length0

Characters and Unicode

Total characters445
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)4.9%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
225
92.2%
ARTESUNATO + MEFLOQUINA3
 
1.2%
ARTESUNATO+MEFLOQUINA2
 
0.8%
ARTESUNATO E MEFLOQUINA2
 
0.8%
ARTESU+MEFL+CLORIDRATO1
 
0.4%
ART+MEF+CLORIDRATO1
 
0.4%
ARTESUNARO + MEFLOQUINA1
 
0.4%
CLOROQUINA 3 E PRIMAQUINA 11 D1
 
0.4%
ARTESUNATO120+CLINDAMICINA45O1
 
0.4%
CLOROQUINA3DIAS PRIMAQUINA14DI1
 
0.4%
Other values (6)6
 
2.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
artesunato6
14.0%
mefloquina6
14.0%
4
 
9.3%
e4
 
9.3%
primaquina2
 
4.7%
artesunato+mefloquina2
 
4.7%
cloroq2
 
4.7%
mefloquina+artesunato1
 
2.3%
artesu+mefl+cloridrato1
 
2.3%
31
 
2.3%
Other values (14)14
32.6%

Most occurring characters

ValueCountFrequency (%)
A54
12.1%
O38
 
8.5%
E31
 
7.0%
I31
 
7.0%
R30
 
6.7%
U30
 
6.7%
N30
 
6.7%
T28
 
6.3%
24
 
5.4%
M20
 
4.5%
Other values (14)129
29.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter389
87.4%
Space Separator24
 
5.4%
Math Symbol17
 
3.8%
Decimal Number15
 
3.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A54
13.9%
O38
9.8%
E31
8.0%
I31
8.0%
R30
 
7.7%
U30
 
7.7%
N30
 
7.7%
T28
 
7.2%
M20
 
5.1%
Q20
 
5.1%
Other values (6)77
19.8%
Decimal Number
ValueCountFrequency (%)
16
40.0%
33
20.0%
02
 
13.3%
42
 
13.3%
21
 
6.7%
51
 
6.7%
Space Separator
ValueCountFrequency (%)
24
100.0%
Math Symbol
ValueCountFrequency (%)
+17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin389
87.4%
Common56
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A54
13.9%
O38
9.8%
E31
8.0%
I31
8.0%
R30
 
7.7%
U30
 
7.7%
N30
 
7.7%
T28
 
7.2%
M20
 
5.1%
Q20
 
5.1%
Other values (6)77
19.8%
Common
ValueCountFrequency (%)
24
42.9%
+17
30.4%
16
 
10.7%
33
 
5.4%
02
 
3.6%
42
 
3.6%
21
 
1.8%
51
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A54
12.1%
O38
 
8.5%
E31
 
7.0%
I31
 
7.0%
R30
 
6.7%
U30
 
6.7%
N30
 
6.7%
T28
 
6.3%
24
 
5.4%
M20
 
4.5%
Other values (14)129
29.0%

DTRATA
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size15.1 KiB
None
177 
2015-02-19
 
6
2015-03-02
 
4
2015-02-12
 
3
2015-05-05
 
3
Other values (46)
51 

Length

Max length10
Median length4
Mean length5.647540984
Min length4

Characters and Unicode

Total characters1378
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42 ?
Unique (%)17.2%

Sample

1st rowNone
2nd row2015-01-06
3rd rowNone
4th row2015-01-09
5th rowNone

Common Values

ValueCountFrequency (%)
None177
72.5%
2015-02-196
 
2.5%
2015-03-024
 
1.6%
2015-02-123
 
1.2%
2015-05-053
 
1.2%
2015-03-053
 
1.2%
2015-02-232
 
0.8%
2015-01-122
 
0.8%
2015-04-062
 
0.8%
2015-04-071
 
0.4%
Other values (41)41
 
16.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none177
72.5%
2015-02-196
 
2.5%
2015-03-024
 
1.6%
2015-02-123
 
1.2%
2015-05-053
 
1.2%
2015-03-053
 
1.2%
2015-02-232
 
0.8%
2015-01-122
 
0.8%
2015-04-062
 
0.8%
2015-04-071
 
0.4%
Other values (41)41
 
16.8%

Most occurring characters

ValueCountFrequency (%)
N177
12.8%
o177
12.8%
n177
12.8%
e177
12.8%
0155
11.2%
-134
9.7%
1121
8.8%
2109
7.9%
582
6.0%
321
 
1.5%
Other values (5)48
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number536
38.9%
Lowercase Letter531
38.5%
Uppercase Letter177
 
12.8%
Dash Punctuation134
 
9.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0155
28.9%
1121
22.6%
2109
20.3%
582
15.3%
321
 
3.9%
915
 
2.8%
412
 
2.2%
610
 
1.9%
86
 
1.1%
75
 
0.9%
Lowercase Letter
ValueCountFrequency (%)
o177
33.3%
n177
33.3%
e177
33.3%
Uppercase Letter
ValueCountFrequency (%)
N177
100.0%
Dash Punctuation
ValueCountFrequency (%)
-134
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin708
51.4%
Common670
48.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0155
23.1%
-134
20.0%
1121
18.1%
2109
16.3%
582
12.2%
321
 
3.1%
915
 
2.2%
412
 
1.8%
610
 
1.5%
86
 
0.9%
Latin
ValueCountFrequency (%)
N177
25.0%
o177
25.0%
n177
25.0%
e177
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N177
12.8%
o177
12.8%
n177
12.8%
e177
12.8%
0155
11.2%
-134
9.7%
1121
8.8%
2109
7.9%
582
6.0%
321
 
1.5%
Other values (5)48
 
3.5%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing244
Missing (%)100.0%
Memory size2.0 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542015-01-0220145320153333045554763212014-12-312014532012-07-164002M6410333304551NaT2111102015-01-021NaNNoneNaT
12B542015-01-0620150120153333045522883382014-12-222014521979-05-114035F5108333300701NaT221105110212AM11302602015-01-0645040.0412015-01-06NaT
22B542015-01-0820150120153333061032116492014-12-202014511947-07-184067M6108333306101NaT241102015-01-081NaNNoneNaT
32B542015-01-0920150120153333045522733652015-01-052015011976-07-174038M6207333304551NaT78230511121231LUAN2015-01-09211307.05112015-01-09NaT
42B542015-01-0920150120153333045530059922015-01-082015011984-02-214030M6408333304551NaT2621052102102015-01-091NaNNoneNaT
52B542015-01-1220150220153333045522883382015-01-052015011974-04-274040M69333303301NaT1113121282015-01-12216.0199ARTESUNARO + MEFLOQUINA2015-01-12NaT
62B542015-01-1220150220153333045522706092015-01-092015011974-04-204040F5909333304551NaT51612511021231LUAN2015-01-122467.0399MEFLOQUINA+ARTESUNATO2015-01-12NaT
72B542015-01-1220150220153333061032116492015-01-102015011949-07-184065M6108333306101NaT1414052112102015-01-121NaNNoneNaT
82B542015-01-1320150220153333045554628862015-01-112015021962-04-194052F6406333304551NaT2101102015-01-131NaNNoneNaT
92B542015-01-1920150320153333045522883382015-01-152015021981-08-054033M6108333306301NaT110212222015-01-1922320.04112015-01-19NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
2342B542015-12-1120154920153333045522883382015-12-082015491978-06-244037M6108333304551NaT253105110112282015-12-112NaN599ARTESUNATO2015-12-11NaT
2352B542015-12-1120154920153333024022765342015-12-082015491958-01-134057M6109333302401NaT2111202015-12-111NaNNoneNaT
2362B542015-12-1420155020153333045554628862015-12-112015491964-02-264051F6109333304551NaT3541452102102015-12-141NaNNoneNaT
2372B542015-12-1420155020153333045522883382015-12-042015481980-02-234035M6108333303301NaT5211102101102015-12-141NaNNoneNaT
2382B542015-12-1520155020153333045522883382015-11-122015451943-07-174072M6106333304551NaT2111102015-12-151NaNNoneNaT
2392B542015-12-1620155020153333045522883382015-12-122015491943-07-154072M6908333304551NaT2144052101102015-12-161NaNNoneNaT
2402B542015-12-1720155020153333045554628862015-12-162015501958-05-084057M6107333304551NaT2101102015-12-171NaNNoneNaT
2412B542015-12-1820155020153333045522883382015-09-172015371987-07-314028M6906333304551NaT711405153121382015-12-1845680.0412015-12-18NaT
2422B542015-12-2120155120153333045522883382015-12-142015501962-09-034053M6909333304551NaT2101102015-12-211NaNNoneNaT
2432B542015-12-2620155120153333045522695462015-12-252015511969-07-114046M6909333304551NaT998999291102015-12-261NaNNoneNaT